fix(security): eliminate API key leakage via ps aux across all three execution layers by dumko2001 · Pull Request #1 · dumko2001/NemoClaw

dumko2001 · 2026-03-18T19:37:28Z

Summary

Closes NVIDIA#325 — API key exposed in process list via ps aux.

This PR is a comprehensive rollup that supersedes open PRs NVIDIA#148, NVIDIA#191, NVIDIA#225, and NVIDIA#330. It fixes every instance of secret leakage and shell injection that was identified during a first-principles audit of all three execution layers.

Root Cause

openshell provider create --credential KEY=VALUE passes the secret as a command-line argument. Every user on the machine can read it via ps aux or /proc/<pid>/cmdline. The same pattern existed in:

Layer	File	Old pattern
Legacy CLI	`bin/lib/onboard.js`	`run(\`openshell provider create --credential "NVIDIA_API_KEY=${process.env.NVIDIA_API_KEY}"`)`
Plugin (TS)	`nemoclaw/src/commands/onboard.ts`	`execOpenShell(["--credential", \`${credentialEnv}=${apiKey}`])`
Blueprint (Python)	`nemoclaw-blueprint/orchestrator/runner.py`	`provider_args.extend(["--credential", f"OPENAI_API_KEY={credential}"])`

Fix — Three Commits

Commit 1 — `fix(runner)`: safe argv primitives + opts.env overwrite fix

runArgv(prog, args, opts) — spawnSync without shell; no metacharacter expansion
runCaptureArgv(prog, args, opts) — execFileSync without shell; returns stdout string
assertSafeName(name, label) — validates user-supplied names against [a-zA-Z0-9][a-zA-Z0-9_-]{0,62}; calls process.exit(1) on rejection
Fix pre-existing opts.env overwrite bug: mergeEnv() destructures opts.env before the rest spread so PATH/HOME/DOCKER_HOST are always preserved

Commit 2 — `fix(cli)`: shell-string → argv arrays (injection prevention)

Converts every run(\...`)` call that accepted user-controlled values across:

bin/lib/onboard.js — all openshell/bash/brew calls, fs.cpSync/fs.rmSync replacing shell cp/rm
bin/lib/nim.js — docker pull/rm/run/stop/inspect + assertSafeName on sandboxName
bin/lib/policies.js — openshell policy get/set + assertSafeName on both sandboxName and presetName + temp file written with mode: 0o600
bin/nemoclaw.js — removes inline NVIDIA_API_KEY=VALUE from sudo argv (superseded by sudo -E env inherit); assertSafeName on deploy instanceName; sandbox connect/status/logs/destroy → runArgv

Commit 3 — `fix(credentials)`: env-lookup form for `--credential`; secrets never in argv

Safe pattern: set the secret in process.env / os.environ before the call, then pass only the env-var name to --credential — openshell reads the value from the environment, never from argv.

File	Change
`nemoclaw/src/commands/onboard.ts`	`process.env[credentialEnv] = apiKey` before `execOpenShell`; both `provider create` and `update` paths changed
`nemoclaw-blueprint/orchestrator/runner.py`	`target_cred_env` with type-based fallback (supersedes NVIDIA#191); `os.environ[target_cred_env] = credential`; `--credential target_cred_env`
`nemoclaw-blueprint/blueprint.yaml`	Add `credential_env: NVIDIA_API_KEY` to `default` profile — without it the fallback would pick `OPENAI_API_KEY` for the `nvidia` provider type
`nemoclaw/src/onboard/config.ts`	`writeFileSync` uses `{ mode: 0o600 }` so config.json is owner-readable only

Verification

$ node --test test/*.test.js
ℹ tests 84
ℹ pass 84
ℹ fail 0

Key tests added:

test/runner.test.js (22 assertions) — assertSafeName rejects ;, $(), |, ../, spaces; runCaptureArgv does not expand shell metacharacters; opts.env preserves PATH
test/credential-exposure.test.js (9 assertions) — static scan of all 3 layers for --credential KEY=VALUE patterns; structural checks for process.env[credentialEnv] and os.environ[target_cred_env]; runtime injection PoC

ps aux before/after

Before (vulnerable):

openshell provider create --name nvidia-nim --type openai --credential "NVIDIA_API_KEY=nvapi-xxxxxxxx..." --config "OPENAI_BASE_URL=..."

After (safe):

openshell provider create --name nvidia-nim --type openai --credential NVIDIA_API_KEY --config OPENAI_BASE_URL=...

The secret is in the process's environment variables, not in its argv.

PRs Superseded

PR	Description	Status
NVIDIA#148	Shell injection via sandbox name	Superseded by commit 2
NVIDIA#191	Python runner credential type fallback	Superseded by commit 3 (includes + extends)
NVIDIA#225	CI / non-interactive mode	Existing `isNonInteractive()` in `onboard.ts` already implements this
NVIDIA#330	Credential leak in `--credential` arg	Superseded by commit 3

Add three argv-safe helpers to bin/lib/runner.js: runArgv(prog, args, opts) -- spawnSync without shell runCaptureArgv(prog, args, opts) -- execFileSync without shell; returns stdout assertSafeName(name, label) -- validates against [a-zA-Z0-9][a-zA-Z0-9_-]{0,62} Fix pre-existing opts.env overwrite: old spread { ...opts } after the merged env silently clobbered it. mergeEnv(opts) destructures opts.env first. test/runner.test.js: 22 new assertions (assertSafeName rejections, injection PoC, opts.env preservation).

Closes shell-injection attack surface in the legacy CJS layer by replacing all user-controlled run() / runCapture() shell strings with the new argv-safe runArgv() / runCaptureArgv() helpers. assertSafeName() guards every user-supplied sandbox/instance/preset name before it enters any command. bin/lib/onboard.js -- all openshell/bash/brew calls -> runArgv; file copies -> fs.cpSync/fs.rmSync (no cp shell) bin/lib/nim.js -- docker pull/rm/run/stop/inspect -> runArgv/runCaptureArgv; assertSafeName guard on sandboxName bin/lib/policies.js -- openshell policy get/set -> runCaptureArgv/runArgv; assertSafeName on sandboxName and presetName; temp policy file written with mode 0o600 bin/nemoclaw.js -- setupSpark: remove inline NVIDIA_API_KEY=VALUE from sudo argv (sudo -E already inherits env); deploy: assertSafeName on instanceName; sandbox connect/status/logs/destroy -> runArgv Supersedes PRs: NVIDIA#148 (shell injection), part of NVIDIA#330 (credential leak).

… in argv Fixes: NVIDIA#325 (API key exposed in process list via ps aux) Supersedes: PRs NVIDIA#191, NVIDIA#330 The root cause: all three execution layers passed the actual credential VALUE as --credential KEY=VALUE, making it visible to any local user via `ps aux` or /proc/<pid>/cmdline. Safe pattern: set the secret in the child's inherited env, then pass only the env-var NAME to --credential (openshell env-lookup form). nemoclaw/src/commands/onboard.ts - process.env[credentialEnv] = apiKey before execOpenShell - --credential arg: credentialEnv (name only, not KEY=VALUE) - applies to both provider create and provider update paths nemoclaw-blueprint/orchestrator/runner.py - Rename credential_env -> target_cred_env with type-based fallback (nvidia -> NVIDIA_API_KEY, openai -> OPENAI_API_KEY) when not set in the blueprint profile. Supersedes PR NVIDIA#191's partial fix. - os.environ[target_cred_env] = credential before run_cmd - --credential arg: target_cred_env (name only) nemoclaw-blueprint/blueprint.yaml - Add credential_env: NVIDIA_API_KEY to the default profile. Without this field the type-based fallback would silently use OPENAI_API_KEY for the nvidia provider_type, causing auth failure. nemoclaw/src/onboard/config.ts - writeFileSync for config.json now passes mode: 0o600 so the file containing endpoint/model/credentialEnv metadata is not world-readable. test/credential-exposure.test.js (new file) - Static source scan: asserts no --credential KEY=VALUE pattern in any of the 3 execution layer files (allowlists dummy/ollama stubs) - Layer-specific structural checks (process.env set, os.environ set, blueprint default profile has credential_env) - Runtime injection PoC: proves old bash -c IS vulnerable; new runCaptureArgv IS NOT All 84 tests pass.

…IA#1114) (NVIDIA#1305) ## Summary Fixes the four issues reported in NVIDIA#1114 — EACCES permission errors and missing gateway token when running inside the NemoClaw sandbox. ### Issue mapping | # | Reported error | Fix | |---|----------------|-----| | 1 | `EACCES: open '/sandbox/.openclaw/openclaw.json.*.tmp'` | `install_configure_guard` — intercepts `openclaw configure` with a clear error and directs users to `nemoclaw onboard --resume` on the host | | 2 | Same as #1 (different PID) | Same fix | | 3 | `EACCES: mkdir '/sandbox/.openclaw/credentials'` | Already resolved on main via NVIDIA#1519 (credentials symlink to `.openclaw-data/`) | | 4 | No WhatsApp QR code | Consequence of NVIDIA#3, also resolved by NVIDIA#1519 | ### Root cause (issues 1 & 2) OpenClaw's `configure` command performs atomic writes — it creates a temp file (`openclaw.json.PID.UUID.tmp`) in the same directory as the config. Since `/sandbox/.openclaw/` is Landlock read-only at the kernel level, file creation is rejected with EACCES. This is by design: the sandbox config is intentionally immutable at runtime. Rather than weakening Landlock (security regression), we intercept the command in the sandbox shell and guide users to the correct host-side workflow. ### Changes **1. `install_configure_guard()`** — Writes a shell function wrapper to `.bashrc`/`.profile` that intercepts `openclaw configure` and prints: ``` Error: 'openclaw configure' cannot modify config inside the sandbox. The sandbox config is read-only (Landlock enforced) for security. To change your configuration, exit the sandbox and run: nemoclaw onboard --resume This rebuilds the sandbox with your updated settings. ``` All other `openclaw` subcommands pass through to the real binary. **2. `export_gateway_token()`** — Reads `gateway.auth.token` from `openclaw.json` and exports it as `OPENCLAW_GATEWAY_TOKEN`, so interactive sessions (`openshell sandbox connect`) can authenticate with the gateway. Persists to `.bashrc`/`.profile` using idempotent marker blocks and cleans stale tokens on revocation. **3. `_read_gateway_token()` helper** — Shared Python snippet used by both `export_gateway_token` and `print_dashboard_urls` (deduplication, uses `with open()` context manager). All three are called in both root and non-root startup paths. ## Security properties preserved - `/sandbox/.openclaw` remains root-owned, Landlock read-only - `openclaw.json` remains chmod 444 (immutable) - No new attack surface — token is read-only from existing config - `command openclaw` bypass preserves all non-configure functionality Fixes NVIDIA#1114 Signed-off-by: Dongni Yang <[email protected]> Co-authored-by: Claude Sonnet 4.6 <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> --------- Signed-off-by: Dongni Yang <[email protected]> Co-authored-by: Claude Sonnet 4.6 <[email protected]>

dumko2001 force-pushed the security/harden-process-execution branch from 2afa1f4 to 47512fa Compare March 19, 2026 03:34

dumko2001 added 3 commits March 19, 2026 09:21

dumko2001 force-pushed the security/harden-process-execution branch from 47512fa to 892bc25 Compare March 19, 2026 03:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): eliminate API key leakage via ps aux across all three execution layers#1

fix(security): eliminate API key leakage via ps aux across all three execution layers#1
dumko2001 wants to merge 3 commits intomainfrom
security/harden-process-execution

dumko2001 commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dumko2001 commented Mar 18, 2026

Summary

Root Cause

Fix — Three Commits

Commit 1 — fix(runner): safe argv primitives + opts.env overwrite fix

Commit 2 — fix(cli): shell-string → argv arrays (injection prevention)

Commit 3 — fix(credentials): env-lookup form for --credential; secrets never in argv

Verification

ps aux before/after

PRs Superseded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Commit 1 — `fix(runner)`: safe argv primitives + opts.env overwrite fix

Commit 2 — `fix(cli)`: shell-string → argv arrays (injection prevention)

Commit 3 — `fix(credentials)`: env-lookup form for `--credential`; secrets never in argv